Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Sci Adv ; 10(4): eadj3786, 2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38266077

RESUMO

Adeno-associated viruses (AAVs) hold tremendous promise as delivery vectors for gene therapies. AAVs have been successfully engineered-for instance, for more efficient and/or cell-specific delivery to numerous tissues-by creating large, diverse starting libraries and selecting for desired properties. However, these starting libraries often contain a high proportion of variants unable to assemble or package their genomes, a prerequisite for any gene delivery goal. Here, we present and showcase a machine learning (ML) method for designing AAV peptide insertion libraries that achieve fivefold higher packaging fitness than the standard NNK library with negligible reduction in diversity. To demonstrate our ML-designed library's utility for downstream engineering goals, we show that it yields approximately 10-fold more successful variants than the NNK library after selection for infection of human brain tissue, leading to a promising glial-specific variant. Moreover, our design approach can be applied to other types of libraries for AAV and beyond.


Assuntos
Dependovirus , Terapia Genética , Humanos , Dependovirus/genética , Biblioteca de Peptídeos , Encéfalo , Aprendizado de Máquina
3.
Artigo em Inglês | MEDLINE | ID: mdl-38052497

RESUMO

Machine learning-based design has gained traction in the sciences, most notably in the design of small molecules, materials, and proteins, with societal applications ranging from drug development and plastic degradation to carbon sequestration. When designing objects to achieve novel property values with machine learning, one faces a fundamental challenge: how to push past the frontier of current knowledge, distilled from the training data into the model, in a manner that rationally controls the risk of failure. If one trusts learned models too much in extrapolation, one is likely to design rubbish. In contrast, if one does not extrapolate, one cannot find novelty. Herein, we ponder how one might strike a useful balance between these two extremes. We focus in particular on designing proteins with novel property values, although much of our discussion is relevant to machine learning-based design more broadly.


Assuntos
Aprendizado de Máquina
4.
Science ; 382(6671): 669-674, 2023 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-37943906

RESUMO

Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients without making any assumptions about the machine-learning algorithm that supplies the predictions. Furthermore, more accurate predictions translate to smaller confidence intervals. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning. The benefits of prediction-powered inference were demonstrated with datasets from proteomics, astronomy, genomics, remote sensing, census analysis, and ecology.

5.
ArXiv ; 2023 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-37292483

RESUMO

Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training of machine learning models on protein sequence fitness data. This chapter highlights successful applications of machine learning to protein engineering and directed evolution, organized by the improvements that have been made with respect to each step of the directed evolution cycle. Additionally, we provide an outlook for the future based on the current direction of the field, namely in the development of calibrated models and in incorporating other modalities, such as protein structure.

6.
Proc Natl Acad Sci U S A ; 119(43): e2204569119, 2022 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-36256807

RESUMO

Many applications of machine-learning methods involve an iterative protocol in which data are collected, a model is trained, and then outputs of that model are used to choose what data to consider next. For example, a data-driven approach for designing proteins is to train a regression model to predict the fitness of protein sequences and then use it to propose new sequences believed to exhibit greater fitness than observed in the training data. Since validating designed sequences in the wet laboratory is typically costly, it is important to quantify the uncertainty in the model's predictions. This is challenging because of a characteristic type of distribution shift between the training and test data that arises in the design setting-one in which the training and test data are statistically dependent, as the latter is chosen based on the former. Consequently, the model's error on the test data-that is, the designed sequences-has an unknown and possibly complex relationship with its error on the training data. We introduce a method to construct confidence sets for predictions in such settings, which account for the dependence between the training and test data. The confidence sets we construct have finite-sample guarantees that hold for any regression model, even when it is used to choose the test-time input distribution. As a motivating use case, we use real datasets to demonstrate how our method quantifies uncertainty for the predicted fitness of designed proteins and can therefore be used to select design algorithms that achieve acceptable tradeoffs between high predicted fitness and low predictive uncertainty.


Assuntos
Algoritmos , Aprendizado de Máquina , Retroalimentação , Incerteza , Conformação Molecular
7.
Nat Biotechnol ; 40(7): 1114-1122, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35039677

RESUMO

Machine learning-based models of protein fitness typically learn from either unlabeled, evolutionarily related sequences or variant sequences with experimentally measured labels. For regimes where only limited experimental data are available, recent work has suggested methods for combining both sources of information. Toward that goal, we propose a simple combination approach that is competitive with, and on average outperforms more sophisticated methods. Our approach uses ridge regression on site-specific amino acid features combined with one probability density feature from modeling the evolutionary data. Within this approach, we find that a variational autoencoder-based probability density model showed the best overall performance, although any evolutionary density model can be used. Moreover, our analysis highlights the importance of systematic evaluations and sufficient baselines.


Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Proteínas/genética
8.
J Exp Biol ; 222(Pt 16)2019 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-31371399

RESUMO

Zooplankton play critical roles in marine ecosystems, yet their fine-scale behavior remains poorly understood because of the difficulty in studying individuals in situ Here, we combine biologging with supervised machine learning (ML) to propose a pipeline for studying in situ behavior of larger zooplankton such as jellyfish. We deployed the ITAG, a biologging package with high-resolution motion sensors designed for soft-bodied invertebrates, on eight Chrysaora fuscescens in Monterey Bay, using the tether method for retrieval. By analyzing simultaneous video footage of the tagged jellyfish, we developed ML methods to: (1) identify periods of tag data corrupted by the tether method, which may have compromised prior research findings, and (2) classify jellyfish behaviors. Our tools yield characterizations of fine-scale jellyfish activity and orientation over long durations, and we conclude that it is essential to develop behavioral classifiers on in situ rather than laboratory data.


Assuntos
Hidrobiologia/instrumentação , Traços de História de Vida , Cifozoários/fisiologia , Aprendizado de Máquina Supervisionado , Zoologia/instrumentação , Animais , Zooplâncton/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...